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SUMMARY 


A procedure is presented for the determination of airplane model structure 
from flight data, including nonlinear aerodynamic effects. The procedure is 
based on a modified stepwise regression (MSR) and several decision criteria. 

The airplane equations of motion are in general form, with the aerodynamic 
force and moment coefficients expressed as polynomials in response and input 
variables. Prior to the development of the MSR, the linear and stepwise 
regressions are briefly introduced. Then the problem of determining airplane 
model structure is addressed. The modified stepwise regression is constructed 
to force a linear model for the aerodynamic coefficient first, then add signif- 
icant nonlinear terms and delete nonsignificant terms from the model. The sta- 
tistical criteria in the stepwise regression for the selection of an adequate 
model are complemented by the prediction sum of squares (PRESS) criterion and 
by the analysis of residuals. The procedure is demonstrated in three examples 
with simulated and real flight data. It is shown that the MSR with the 
extended decision criteria performs better than the ordinary stepwise regres- 
sion. The MSR is also applied to successfully determine the model structure 
from large-amplitude maneuvers in which the data have been partitioned as a 
function of angle of attack. 


INTRODUCTION 

The estimation of stability and control parameters from flight data has 
become a standard procedure for airplanes in flight conditions where the aero- 
dynamic characteristics can be described _in _linear terms only and where no 
significant external disturbances are present. Interest in poststall and spin 
flights has created a need to extend parameter estimation into flight regimes 
where nonlinear aerodynamic effects could become pronounced. This introduces 
the problem of determining how complex the model should be. Although a more 
complex model can be justified for proper description of airplane motion, it 
has not been clear in parameter estimation which relationship between model 
complexity and measurement information would be the best. If too many param- 
eters are sought from a limited amount of data, a reduced accuracy in evaluated 
parameters can be expected (large covariance and/or unrealistic values of some 
parameters) or attempts to identify all parameters might fail. 

In the field of system identification with general application, a number 
of different methods for determining an adequate model have been developed. 
Simple statistical methods introduced in reference 1 are connected with the 
determination of model order in parameter estimation for the single-input, 
single-output system. Advanced statistical methods of reference 2 are more 
general and applicable to multiple- input, multiple-output systems. 

One of the first attempts to test the correctness of the model represent- 
ing an airplane was introduced in reference 3. The appropriate statistic was 
formed by a ratio of two variance estimates from residuals and repeated mea- 


surements of frequency response curves* In reference 4, the analysis of resid- 
uals was recommended for checking the accuracy of the model, and the sensitiv- 
ity of a response to parameter changes was suggested for finding the important 
parameters in the model* In reference 5, a new criterion for fit error was 
presented which combined the sum of squares of residuals and the number of 
parameters in the model* Later, in reference 6, a criterion for finding an 
optimal number of unknown parameters satisfying the expected model response 
error was developed* All these approaches were either limited in their use or 
were presented without any application to the real flight data* 

A comprehensive treatment of model structure determination based on step- 
wise regression is presented in reference 7. This technique was included in an 
identification procedure which covered model and parameter selection, parameter 
estimation, and model verification. It was applied to simulated data and, in 
limited extent, to the flight data* The extension of the research is covered 
by reference 8 where the review of various criteria for the selection of the 
"best" model is also included. An approach similar to that mentioned in refer- 
ence 8 was used in reference 9 for the analysis of flight data from various 
maneuvers. The estimates obtained were compared with wind-tunnel data and 
theoretical predictions. Various degrees of agreement were obtained. The 
formulation of global models for aerodynamic coefficients was attempted, but no 
comparison of measured and predicted responses was given. 

The purpose of this report is to reexamine the applicability of stepwise 
regression to the determination of airplane model structure from flight data. 
The emphasis is given to the development and interpretation of criteria which 
would enable the researcher to select the "best" model for a given test run and 
to the verification of the model selected* The report starts with a short 
description of the linear and stepwise regression. Then the problem of deter- 
mining an adequate model for an airplane is discussed and the stepwise regres- 
sion, complemented by a constraint and several decision criteria, is used for 
selecting the model. The entire procedure for model structure determination 
is first tested on several sets of computer-generated data with different 
measurement-noise characteristics. Then, the real flight-test data are ana- 
lyzed and the results are verified. It is shown that the resulting technique 
can be used with assurance in determining the structure of nonlinear models for 
poststall flights. 


SYMBOLS AND ABBREVIATIONS 

aQ,a-| model parameters for intermediate step in stepwise regression 
longitudinal, lateral, and vertical accelerations, g units 
B = pSc/4m 

b wing span, m 

C rolling-moment coefficient, M /qSb 

pitching-moment coefficient, M^/qSc 
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C yawing-moment coefficient, M /qsb 

n z 

C„ longitudinal-force coefficient, F /qS 

X X 

Cy lateral-force coefficient, F^/qS 

C„ vertical-force coefficient, F VqS 

^ z 

c wing mean aerodynamic chord, m 

e{ } expectation operator 

F F-statistic 

F F-statistic used in partial F-test 

P 

forces along longitudinal, lateral, and vertical body axes, 
respectively, N 

2 

g acceleration due to gravity, m/sec 

Hq,H^ null and alternative hypotheses 
k lag number in autocorrelation function 

I identity matrix 

Ij^,Iy,l2 moment of inertia about longitudinal, lateral, and vertical 
body axes, respectively, kg-m"^ 

2 

1x2 product of inertia, kg-m 

i quantity at ith interval 

Mj^,My,M2 rolling, pitching, and yawing moments, respectively, N-m 

m mass, kg 

N number of data points 

n number of unknown parameters 

p roll rate, rad/sec or deg/sec 

q pitch rate, rad/sec or deg/sec 

- 1 2 

q = - pV , kinetic pressure. Pa 

R autocorrelation function of residuals 

e 

squared multiple correlation coefficient 
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r yaw rate, rad/sec or deg/sec 

rjy partial correlation coefficient 

r . - partial correlation coefficient after variable x- is included in 

jy • • * 

model 

5 wing area, 

Sjy,Sjj,Syy sum of s qu a r e s defined by equation (12) 

s standard error 

2 

s estimated variance 

thrust components along longitudinal and vertical axes, 
respectively, N 

t time, sec 

V airspeed, m/sec 
Var{ I variance operator 

X N X n matrix of independent variables 

X independent variable in regression equation 

Y N X 1 vector of dependent variables 

y dependent variable in regression equation 

y* dependent variable used in intermediate step of stepwise regression 

z independent variable used in intermediate step of stepwise regression 

a angle of attack, rad or deg 

a confidence level with F- statistic 

P 

P sideslip angle, rad or deg 

6 aileron deflection, rad or deg 
a 

6 elevator deflection, rad or deg 

e 

6^ rudder deflection, rad or deg 

e equation error (measurement noise) 

0 n X 1 vector of unknown parameters 
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0 jth element of vector of unknown parameter 

J 

0 pitch angle, rad or deg 

V ,v degrees of freedom for numerator and denominator of F-statistic, 

respectively 

p air density, kg/m^ 

a standard deviation 

2 

a variance of measurement noise 

Subscripts: 

j index of parameters and independent variables 

JL Jlth model equation 

0 trimmed condition 

Superscripts: 

T transpose matrix 

-1 inverse matrix 

Abbreviations: 

LS least squares 

ML maximum likelihood 

MSPE mean square prediction error 

MSR modified stepwise regression 

PRESS prediction sum of squares 

RSS residual sum of squares 

Aerodynamic derivatives referenced to a system of body axes with the origin at 
the airplane center of gravity: 
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The derivatives C' , C , C* are defined in appendix A. 

m,0 m m^2 

a pa 

A bar over a symbol denotes the mean value# A dot above the symbol denotes a 
derivative with respect to time# A circumflex denotes an estimated value- 


LINEAR REGRESSION 

The linear regression technique is employed to estimate a functional rela- 
tionship of a dependent variable to one or more independent variables. It is 
assumed that the dependent variable can be closely approximated as a linear 
combination of the independent variables. For airplane models, the resultant 
aerodynamic force and moment are expressed by means of the aerodynamic model 
equations which may be written as 


y(t) = 0 + 0 x (t) + ... + 0 x 

0 11 n- 1 n- 1 


( 1 ) 


In this equation, y(t) represents the resultant coefficient of aerodynamic 
force or moment (the dependent variable), 0-| to are the stability and 

control derivatives, 0^ is the value of any particular coefficient cor- 
responding to the initial steady-flight conditions, and x^ to are 

the airplane state and control variables (the independent variables). The 
variables x^ to niay also include any combination of the state 

and control variables. 

Let us assume that a sequence of N observations on both y and x has 
been made at times t 2 / •#•/ t^. If the measured data are denoted by 

y(i) and x^(i), X 2 (i), ..., x^^_^(i) where i = 1, 2, N, then these data 

can be related by the following set of N linear equations: 


y(i) = ©n ■*“ + ... + 0 -X (i) + e(i) (2) 

u I 1 n— 1 n- 1 


Because equation (1) is only an approximation of the actual aerodynamic rela- 
tions, the right-hand side of equation (2) includes an additional term e(i), 
often referred to as the equation error. For N > n, the unknown parameters 
can be estimated from the measurement by the method of least squares as 


0 = 



(3) 


where 0 is the n x 1 vector of parameter estimates, Y is the N x 1 
vector of measured values of Y(i), and X is the N x n matrix of measured 
independent variables. 
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The properties of the least-squares (LS) estimates depend upon the postu- 
lated assumptions concerning the measured dependent variables and equation 
errors. These assumptions are 

1 . e is a stationary vector with zero mean value 

2. e is uncorrelated with X 

3. X is a deterministic quantity (i.e., the state and input variables 

are measured without errors) 

4. e(i) is identically distributed and uncorrelated with zero mean and 

variance 

Under assumptions 1 and 2, the LS estimates are unbiased. When assumptions 3 
and 4 are considered, it can be shown that the LS estimates are also consistent 
and efficient (for example, see refs. 10 and 11). The covariance matrix of 
parameter errors has the form 


e{( 0 - 0)(0 - 0)'^} = a^(x'^x) 


-1 


(4) 


For the estimate of this covariance matrix, is replaced by its estimate 


s^ = 

® N - n 


N 

E E^(i> 

i=1 


(5) 


where 


and 


e(i) = y(i) “ y(i) 


y(i) — 0Q + 0^x^(i) + ... + ®n-1^n-1^^^ 


( 6 ) 


When assumptions 1 to 4 are extended by the assumption of a normal distri- 
bution for e(i), confidence intervals for the estimates can be found and some 
hypothesis tests can be employed. (See ref. 12.) Two of these tests will be 
used later in the report. The first one represents the test of overall regres- 
sion with the null and alternative hypothesis formulated as 


®1 - ®2 ■ ••• ■ ®n-1 ■ " 


: not all 0 . = 

1 3 
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The null hypothesis is rejected if 


F > F(v, , v-,a ) 

1 2 p 


where 


/^T T - 2 

P = 0 X Y - (7) 

(n - 1)s^ 

is a random variable having an F-distribution with the number of degrees of 
freedom v-| = n - 1 and V 2 ~ N - n and where tabu- 

lated values of the F-distribution for the significance level a . 

P 

The second test is of the significance of individual terms in the regres- 
sion (a partial F-test) . The hypotheses are 


e. = 0 

0 D 

H : 0. it 0 

1 3 

and the testing criterion is 

- 2 

Qj 

s2(0j) 


( 8 ) 


The null hypothesis is rejected if F^ > F ( v-| / V 2 , oCp) where = 1 and 
V 2 = N - n. In equation (7), 


y 


1 

N 


N 


E 

i=1 


y( i) 


9 ^ 

and, in equation (8), s (0j) is the variance estimate of 0 j . 

The random variable F specified by equation (7) is related to the 
squared multiple correlation coefficient 


R 


2 


i:^[y(i) - y]^ 

E^[y(i) - y]^ 


0^X^ 


T 

Y Y - 



( 9 ) 


by the equation 
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F 


1 


( 10 ) 


N " n 
n - 1 


R 


- R 


In an actual experiment^ assumptions 1 to 4 and the normality of e(i) 
are not generally met. Because of the measurement errors in the independent 
variables^ the LS estimates are asymptotically biased, inconsistent, and inef- 
ficient. (See refs. 10 and 11.) However, experience with flight data indi- 
cates that the LS estimates still could be accurate enough and even could be 
comparable to those from the maximum likelihood method, which is expected to 
give consistent and asymptotically unbiased results. It is also believed that 
the F-tests can be formed with real flight data because of the robustness of 
the F-statistic with respect to the normality assumption. On the other hand, 
equation (4) for the covariance matrix usually gives optimistic values for the 
parameter variances. 


STEPWISE REGRESSION 

The stepwise regression is a procedure which inserts independent variables 
into the regression model, one at a time, until the regression equation is 

actory . The order of insertion is determined by using the partial corre- 
lation coefficient as a measure of the importance of variables not yet in the 
regression equation. The procedure starts with the postulation of a regression 
model given by equation (2). The first independent variable from the postu- 
lated model is chosen as the one which is most closely correlated with y* The 
correlation coefficient is given by the expression 


r . = 

jy 


S . 

- jy 


(S..S ) 

3 3 YY 


1/2 


where 


®iY ■ I - y] 


- -|2 


s. . = y fx.(i) - X. 

N L 3j 


-i2 


S = I [y(i) - y] 

N 


Xj Kj(l) 


(11) 


(12a) 


(12b) 


(12c) 


(12d) 
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If X. is selected as then the model 


y = 0. 




+ e 


( 13 ) 


is used to fit the data. A new independent variable 
finding the residuals of after regressing it on 
uals from fitting the model 


is constructed by 
x^ , that is, the resid- 


^^2 = 




(14) 


The variable 


is, therefore, given as 


^2 = ^2 ’ 




(15) 


Similarly the variables z^, z^, ..., z^ ^ are formed by regressing the vari- 
able x^ on x^ , x^ on x-j , and so forth. A new dependent variable y* is 
represented by residuals of y regressed on x^ using the model given by 
equation (13); that is. 


y* = y 


0(3 - G^x^ 


(16) 


In the next step, a new set of correlations which involve the variables 
y*, Z 2 , z^, •••# formulated. These partial correlations can be 

written as r - meaning the correlations of z . and y* are related 

jy .1 j 

to the model containing the variable x^ • The expression for the partial cor- 
relation coefficients ^jy.i given by equations (11) and (12) after replac- 

ing y and Xj by y* and z j . The next variable added to the regression 
model is the variable Xj whose partial correlation coefficient was the great- 
est. If the second independent variable selected in this way is X 2 / then the 

third stage of the selection procedure involves partial correlations of the 

form r.„ that is, the correlations between the residuals of x. regressed 

jY • * z j 

on x^ and X 2 and the residuals of y regressed on x^ and X 2 . 

At every step of the regression, the variables incorporated into the model 
in previous stages and a new variable entering the model are reexamined. The 
partial criterion given by equation (8) is evaluated for each vari- 

able and compared with the preselected percentage point of the appropriate 
F- distribution. This provides a judgment on the contribution made by each 
variable. Any variable which provides a nonsignificant contribution (small 
value of Fp) is removed from the model. A variable which may have been 
the best single variable to enter at an early stage may, at a later stage, be 
superfluous because of the relationship between it and other variables now 
in the regression. The process of selecting and checking variables continues 
until no more variables will be admitted to the equation and no more are 
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rejected. The complete computing scheme for the stepwise regression can be 
found in reference 12* 


MODEL STRUCTURE DETERMINATION 

A model for a system is an operator which converts the given input to the 
system into the response of the system. In this report, a model will be 
described by a model structure (analytical representation of a model) and model 
parameters (coefficients in the analytical representation). The correct model 
of an airplane is, in general, unknown and unknowable. Therefore, a major 
problem in system identification is the selection, from measured data, of an 
adequate model. An adequate model is a model which sufficiently fits the data, 
facilitates the successful estimation of unknown parameters, and has good pre- 
diction capabilities. 

For the model structure determination procedure, it will be assumed that 

(a) the general equations of motion for a rigid body adequately define 

the airplane motion 

(b) the model for the aerodynamic force and moment coefficients can 

be represented by multivariable polynomials in response and control 
variables; the parameters in these equations are the coefficients of 
the Taylor series expansion, around the values corresponding to the 
initial steady-state flight 

(c) some of the linear terms in the Taylor series expansion make the 

largest contribution to aerodynamic functions, followed by the 
higher order terms 

The second assumption is an extension of the concept of airplane stability 
and control derivatives in the linear aerodynamic model equations. The third 
assumption will result in a constraint on the selection of significant terms in 
the regression equation. This constraint is explained in the following para- 
graph and substantiated by the examples presented. It also provides infor- 
mation about the performance of a linear model. 

The determination of an adequate model using the stepwise regression 
includes the three steps: the postulation of terms which might enter the final 

model, the selection of an adequate model, and the verification of the model 
selected. The postulated model forms for the longitudinal and lateral aerody- 
namics are presented in appendix A. The computing scheme for the selection of 
an adequate model is modified from that in reference 12. The linear terms in 
the model are examined first. They enter the regression according to their 
partial correlation coefficients and are kept in the model regardless of the ^ 
value of F . This means that during this part of the procedure no hypothesis 
testing is applied. When all linear terms are included, the procedure contin- 
ues as for the ordinary stepwise regression. The nonlinear terms postulated 
are searched and the null hypothesis concerning their significance, and the 
significance of all terms already in the model (linear and nonlinear), is 
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tested. The stepwise regression technique with the constraint mentioned will 
be further referred to as the modified stepwise regression (MSR). 

As indicated in the previous chapter, the tabulated values of 
F(1,N-n,ap) and F(n-1 ^N-n^a^ ) depend on the number of data points, the 
number of parameters in the model, and the selected risk level F. For 
N > 100, the effect of n on the tabulated values of F is small; there- 
fore, F( 1 ,N-n, 0. 0 1 ) is taken as 7, regardless of N and n. The tabu- 
lated values of F(n-1 ,N-n,ot^) for N > 100 and Op = 0.01 vary approxi- 
mately from 3.0 to 2.3. It is indicated in reference 13, however, that, in 
order for the model selected to be regarded as a satisfactory predictor, the 
observed F-values not only should exceed the selected percentage point of the 
F- distribution but should be about four times the selected percentage point. 
Based on these observations, the value of F used for comparison is selected 
as equal to 12. 

Experience with several test runs showed that the model based only on the 
statistical significance of individual parameters on the regression equation 
can still include too many parameters. It is, therefore, recommended that more 
quantities and their variations be examined as possible criteria for the selec- 
tion of an adequate model. Several quantities which could be examined include 
the following: 

(a) The computed values of F for each parameter in the model. Because 
Fp is the inverse of the relative^parameter variance, it should have the maxi- 
mum values for an adequate model. 

(b) The computed value of F, which is given as the ratio of regression 
mean square to residual mean square. The model with the maximum F-value has 
already been recommended in reference 7 as the "best" one for a given set of 
data. 


(c) The value of the squared multiple correlation coefficient R which 
can be interpreted as measuring the proportion of the variation explained by 
the terms other than 0^ in the model. However, the improvement in R"^ due 
to adding new terms to the model must have some real significance and should 
not reflect only the effect of the increased number of parameters. The value 
of R varies from 0 to 1 (perfect fit). Its variation is often expressed in 
percent. 


(d) The value of the residual sum of squares (RSS) defined for the 
ith model as 


E [y(i) - y(i) 

1=1 ^ 


(17) 


This value is the measure of the "goodness of fit" and, for its inprovement , 
the same note applies as that for R^. 
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(e) The value of residual variance 


s^(e) 


estimated from 


s2(e) 


RSS 
N - n 


( 18 ) 


2 

which should be compared with an unbiased estimate of the variance a (e), if 
available. 

(f) The residuals e(i). For an adequate model, their time history 
should be close to a random sequence which is uncorrelated and Gaussian. 

Optimal values of these quantities may provide criteria which will guaran- 
tee good fit to the data, but they will not necessarily select a model which is 
a good predictor. However, there is a rule commonly used in choosing a model 
which will be a good predictor. It is known as the "principle of parsimony," 
and it can be stated (see ref. 14) as follows: given two models fitted to the 

same data with residual variances o^(e) and 02 (e) which are close to each 
other, choose the model which involves the smaller number of parameters. The 
prediction sum of squares (PRESS) criterion for the selection of a parsimonious 
model is proposed in reference 15. The PRESS, associated with the Ath subset 
of model parameters, is defined as 


“ 1 2 

PRESS = ^ {y(i) - y[i|x(1), x(i - 1), x(i + 1), ..., x(N)]^| 

i=1 

where y(i|...)jj^ is the estimate of E{y(i)} using the Ath subset and exclud- 
ing the ith observation. Some notes on the development of this criterion, its 
interpretation, and its computation are given in appendix B. In the following 
examples, the R^, F, and PRESS values computed at each entry to the MSR pro- 
cedure will be used for model selection. These values will be complemented by 
the estimates of autocorrelation functions of residuals. 

Additional checks on the accuracy of estimated parameters and the check of 
prediction qualities of the selected model are considered verification of the 
model. The parameter estimates can be compared with the results from repeated 
measurements under the same conditions; that is, the same flight conditions and 
input forms. Further the least-squares estimates can be compared with esti- 
mates using different techniques but the same data and model. For this com- 
parison, the maximum likelihood method (e.g., see ref. 16) is recommended 
because of its optimal asymptotical properties. Finally the parameter esti- 
mates must have realistic values and should be in an agreement with wind 
tunnel results and theoretical predictions. 
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EXAMPLES 


In the following three examples/ the modified stepwise regression was 
applied to various sets of simulated and measured data of a general aviation 
airplane. In all examples, the airplane equations of motion from appendix A 
were used. 


Example 1 

The purpose of this example is to demonstrate the sensitivity of the MSR 
and the criteria for the model selection to the measurement errors of the 
dependent and independent variables in the regression equation. The simulated 
data used were created by a fourth-order Runge-Kutta integration with a step 
size of 0.001 sec. Equations for the aerodynamic model contained certain non- 
linear terms. The input variables 6^ and 6^ were taken from flight mea- 
surements. The time histories of the input and some response variables are 
plotted in figure 1. From these data, the three aerodynamic coefficients 
Cy, C^, and were computed. 

In the next step, the simulated responses 3, a, p, and r, and the 
aerodynamic coefficients mentioned in the preceding paragraph, were corrupted 
by a Gaussian noise, which had a zero mean and the standard errors given in 
table I for the three cases considered. Case 1 represents the data where only 
the dependent variable in the regression is in error. In cases 2 and 3, the 
state variables are also in error. The values of simulated measurement errors 
are close to those estimated from real flight data (case 1) and from ground 
calibration of an instrumentation system (case 2). 

Models which were determined to be adequate yielded the parameter esti- 
mates and values of the squared multiple correlation coefficients which are 
listed in table II for all three cases. Also presented are the parameters of 
the true model. The F, PRESS, and R^ values in case 1 are plotted in fig- 
ure 2 against the entry number into the MSR algorithm. The computed F-values 
for all three coefficients are much higher than the recommendation of four 
times the tabulated value of three (i.e., 12) thus indicating the significance 
of the regression for all models. The first computed values of PRESS from all 
data points (N = 351) showed almost the same variation with the increased num- 
ber of parameters in the model as the RSS. This possibility is pointed out in 
appendix B. Improvement in the PRESS criterion for the model selection was 
achieved by reducing the number of data points for computing the PRESS. The 
PRESS values in figure 2 were obtained from every tenth data point of the given 
set. 


For the data set of case 1, the MSR performed well. In the side-force 
equation, the best model was the linear model completed by the nonlinear 
term pa. This model was selected at the minimum of PRESS and the second 
maximum of F. The first maximum of F indicates only the strong effect of 
the parameter in the equation for Cy, which is also reflected by 

2 ^ . 2 2 
R - 97,2 percent in the first entry. The two nonlinear terms ra and a 

in the true nK)del were not selected, because the other terms in the model have 
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already explained 98.9 percent of the variation of Cy. It was observed, 
however, that the ra^ term was the next to enter the chosen model. None of 
the estimated parameter values was statistically different from the true model. 

In the rolling-moment equation, the model selected was that of the true 
model, with all parameters the same as the true values. The best model cor- 
responds to the extreme values both in F and PRESS criteria. In this equa- 
tion, the linear term 6 was not an element in the true model. Though con- 
strained to enter the regression (being a linear term) , the 6 term was later 
eliminated as insignificant to the overall best model. 

For the yawing-moment equation, an adequate model, according to the 
F-criterion, contains six of the seven terms in the true model, with all param- 
eters within 2a of their true values, except C . When, based on the 

n 

pa 

minimum of the PRESS value, the remaining term ra is included (entry number 

2 

seven), the value of C is changed within la of its true value, and R 

n 

pa 

improves from 83.4 to 85.0 percent. It was also seen from the results that the 
partial F-value for the ra term was equal to or greater than the partial 
F-values for two of the linear terms. In this example, the PRESS criterion 
performs better than the F-criterion. 

Although the stepwise regression assumes, in principle, that the measure- 
ment noise is present only in the dependent variable, noise was also added to 
the state variables in two examples (cases 2 and 3) . The parameter estimates 
in the higher noise environment deviated slightly from the true values. As 
seen from table II, the chosen model structures in some runs were also slightly 
different from those in case 1, reflecting the overall higher noise to signal 
ratio and an effort by the MSR to fit the noise. Furthermore, the noise in 
state variables decreased the uniqueness of the selection in both the F and 
PRESS criteria (less distinct extreme values) and, in some runs, shifted the 
extreme values of these criteria apart. The data in all three cases were also 
analyzed by the stepwise regression without constraint on the postulated linear 
terms in the model. Adequate models determined by this approach are summarized 
in table III. It is apparent from these results that the measurement errors in 
the data can cause, in some cases, the determination of an incorrect model if 
the constraint in the algorithm is removed. The examples presented substan- 
tiate, therefore, the use of the MSR rather than the stepwise regression with- 
out constraint. 


Example 2 

In this example the MSR technique for model structure determination was 
applied to the measured data. The data, sampled at 0.05 sec, represent a lat- 
eral response of the airplane at a » 20®. The time histories of the input and 
some response variables are plotted in figure 3. The response variables indi- 
cate that the airplane exhibits a limit-cycle type of lateral motion which is 
also strongly coupled with the short-period longitudinal mode. In figure 4, 
the F, PRESS, and R^ values for the lateral coefficients examined are plot- 
ted against the number of entry into the MSR. 
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An adequate model for the side-force coefficient was selected at the 
eighth entry where PRESS has its minimum and F the second maximum. For the 
coefficient the F-criterion indicates an adequate model at the sixth 

entry, the PRESS at the ninth. The difference in R^ at these two entries is 
only 2 percent. Therefore, in consideration of the principle of parsimony, the 
model with the smaller number of parameters was selected. For the coefficient 
C^, the changes in the F, PRESS, and R'^ values after the fifth entry are 
apparent. These changes indicate that the linear model (first five entries) is 
completely inadequate and that some nonlinear terms must be included. An ade- 
quate model was selected at the seventh entry where the PRESS values have their 
minimum and F-values their first maximum. Comparisons between measured and 
computed time histories of for the linear model and for an adequate model 

are presented in figures 5 and 6, respectively. Also included are the residual 
time histories and the autocorrelation functions of residuals. For the linear 
model, the fit to the data is poor. By adding two nonlinear terms pa and ra, 
the fit was improved substantially and the autocorrelation function of 
residuals was close to that for the uncorrelated random variable. 

The variables included in the adequate models for the three coefficients 
are summarized below, in the order that they entered the model, for 


CyS 3, 6^, r, 

C : 6 , p, 3 , 

n a 

In figure 7, the measured output time histories are compared with those pre- 
dicted by using the model for Cy, and determined by the MSR. The 

agreement in these time histories is good except for the yawing velocity, which 
could be caused by insufficient excitation of this variable. 

The next step in the airplane identification included estimation of the 
parameter by using the maximum likelihood method of reference 16 with the model 
structure determined hy the MSR. In this estimation process, the nonlinear 
parameters were kept fixed on the least-squares estimates. Any attempt to 
estimate the whole set of aerodynamic parameters failed because of a divergence 
in the ML algorithm. The resulting ML and MSR estimates are presented in 
table IV. Some differences in the estimated parameters from both methods 

exist, mainly in the damping derivative C and the cross derivative C 

£ n 

P P 

All these differences might be caused by undetected modeling error and by the 
correlation between linear and nonlinear parameters. Simulated studies of the 
flight regime analyzed also showed that the data were very sensitive to even 
small changes in certain parameters. In figure 8, the measured output time 
histories are compared with those computed by the ML method. 


P/ pot/ ra , a‘ 

a 

r, pa 

6^, r, pa, ra 
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The model structures for the three coefficients C , C , and C were 
also determined by the stepwise regression without constraint. The resulting 
models included the following variables for: 


P' pot / P 


3 2 

Cl p, p , Pa / p, 6^, pa 


C ; pa, p , p, 6 , p, ra 


As in the previous example with simulated data, these models are different from 
those determined by the MSR. In models for the second and third coefficients, 

for example, the linear parameters Cn and C are missing. When the new 

“r 

aerodynamic model equations were used for prediction of the output variables, a 
divergent motion of the airplane resulted. The variables selected by the first 
MSR gave the model which described the motion of the airplane very well. The 
variables selected by the stepwise regression without constraint fit the time 
histories of Cy, and equally well, but failed to predict the air- 

plane motion correctly. 

The physical meaning of some of the estimated nonlinear parameters can be 

assessed from figure 9, where the three linear parameters estimated from five 

test runs are plotted against angle of attack. The values of parameters 

C , C. , C , and C - (slopes of the solid lines) agree quite well 

Y X n n 2 

pa pa pa pa 

with the trend in changes of , and with a. Also plotted in 

P P P 

figure 9 are the ML estimates of the parameters considered by using adequate 
models determined by the MSR. 


Example 3 


In the last example, the data from a longitudinal large- amplitude maneuver 
were analyzed. The measured time histories of the main output and input vari- 
ables are plotted in figure 10. The MSR regression selected the same form of 
an adequate model for both coefficients and C^. The terms included in 

these models are a, a^, q, qa, and 6^. The resulting parameters and their 
variations with the angle of attack are plotted in figure 11. These results 
are compared with the parameters obtained from 21 transient maneuvers initiated 
from prestall and poststall steady-state flight regimes (triangle symbols in 
fig. 11). In these 21 maneuvers, the excitation of the motion was considerably 
smaller than that in the maneuver shown in figure 10. 


The models for the large- amplitude maneuver include the linear variation 
of some parameter values with a. These variations agree with the trend given 
by the results from the small-amplitude maneuvers. This agreement was improved 
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by partitioning the data from the large-amplitude maneuver into six subsets 
according to the values of a* The first subset included the data with a 
varying from its minimum value to 4®. The second subset consisted of data 
corresponding to a between 4® and 8®, and so forth, until the sixth subset 
was filled with data corresponding to a between 20® and 24®. This partition- 
ing was then repeated, starting with the subset of data values for a between 
2® and 6® and ending with a between 22® and 26®. An adequate model was 
determined for each data subset by applying the MSR. The resulting parameters 
are plotted in figure 11 (closed symbols) • The parameters from the partitioned 
data agree well with the results from the 21 maneuvers* They, therefore, more 
closely describe the variations of the parameters with a than the estimates 
from the complete set* This indicates a preferable way of analyzing large- 
amplitude maneuvers. 

Presented in figure 12 are the standard errors in and as esti- 

mated from the residuals. The standard errors for partitioned data and small- 
anplitude maneuvers are in good agreement. The standard errors for the whole 
set of data points in the large- amplitude maneuver are greater than the average 
values from the partitioned data. This might be caused by unexplained modeling 
errors in the regression equation for and C^. The number of data points 

in each subset is apparent from figure 13. 


CONCLUDING REMARKS 

A procedure for determining airplane models from flight data has been 
developed. It starts with the airplane model formulation which uses the gen- 
eral equations of motion and postulated aerodynamic equations. The aerodynamic 
coefficients are expressed in terms of multivariable polynomials in input and 
output variables. The procedure is based on the ordinary stepwise regression 
which has been modified by adding a constraint to the parameter selection and a 
prediction sum of squares (PRESS) criterion for the model structure determina- 
tion. To finalize the procedure, some steps in model verification have been 
suggested. 

The following are the main conclusions drawn from research work covered by 
this report: 

1. The examples with simulated and real flight data showed that the modi- 
fied stepwise regression can determine an airplane model either closer to the 
true model (simulated data) or with better prediction capabilities (real data) 
than the stepwise regression without a constraint. 

2. The PRESS criterion, in combination with computed F-values and values 
of the multiple correlation coefficient, increased the ability of the procedure 
to select a parsimonious model from measured data. For computing of PRESS 
values, a limited number of data points should be used rather than the whole 
set. With the increase of number of data points, PRESS approaches the residual 
sum of squares, which cannot distinguish between the parsimonious and overfit- 
ted model. 
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3. In principle, the stepwise regression assumes the measurement noise 
only in the aerodynamic coefficients (dependent variables in the regression 
equations). The results from the limited amount of simulated data show that 
noise levels in the state variables, corresponding to the values obtained from 
ground calibration of an instrumentation system, do not influence the deter- 
mination of an adequate model. For the higher noise to signal ratio, the model 
selected can include terms which compensate for the noise in outputs rather 
than for the airplane dynamics. 

4. Where the modified stepwise regression was applied to flight data in 
the high-angle-of-attack regime, the nonlinear terms in the model brought the 
residuals closer to the uncorrelated random sequence and the parameters asso- 
ciated with these nonlinear terms had physical meanings. 

5. The modified stepwise regression, in its present form, can also be used 
for the analysis of large-amplitude maneuvers. For these maneuvers, the data 
should be partitioned according to variables which influence the existence of 
nonlinear terms in the aerodynamic model equations (e.g., the angle of attack). 

6. The model determined by the modified stepwise regression can be 
accepted if it closely predicts the airplane response and if the parameters in 
the model are close to the maximum likelihood estimates based on the same model 
structure. 

The procedure presented represents the first step toward the determination 
of an overall model of an airplane from flight data. When properly used it can 
provide results for better understanding of airplane aerodynamics at high 
angles of attack and for global stability and control analysis of an airplane 
at these flight conditions. 


Langley Research Center 

National Aeronautics and Space Administration 
Hampton, VA 23665 
August 3, 1981 
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APPENDIX A 


AIRPLANE EQUATIONS OF MOTION 

The airplane equations of motion are referred to the body axes. They are 
based on the assumptions that 

( 1 ) the airplane is a rigid body 

(2) the effects of spinning rotors are negligible 

For the stepwise regression method, the equations of motion can be formu- 
lated as 
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APPENDIX A 


The aerodynamic coefficients are postulated as functions of the state and 
input variables and their combinations as follows; 


(a) The longitudinal coefficients C^, 

2 2 2 3 

of tt/ qf 6^, a , qUf P ' ocp f a , 

(b) The lateral coefficients and 

B, p, r, 6 , 6 / Pa, pa, ra, 6^a, 6^a, 

a r a r 

^3 ^4 ^5 ^3 2 ^3 2 3 

p, p, p, p, pa, pa, a, a, a 


and C^, as functions 

4 5 6 7 8 

a , a , a , a , a 

C as functions of 
n 

2 2 2 2 
Pa , pa , ra , 6^a , 

di 



The variables in both model forms represent the increments with respect to 
their trim values. In the equation for the pitching-moment coefficient, the 
term a was not explicitly included to avoid possible identification prob- 
lems. The parameters in this equation are related to the parameters in the 
functional relationship 


C = C (a,a,p,q,6 ) 
mm e 


by the expressions (see ref. 17) 


C* 

m,0 


= C 


m,o(®^Z,0 


3C_ 

2V^ 


cos 


C’ = C + BC C 
m m m. Z 

a a a a 


c* 

m 

q 


c + 

m 

q 




c* 

m 


6 


e 



BC 


m. 


e 


C' . 
a 


C 

m 

a 


j 


+ 


BC C 

m. 

a 


Z 

a 


j 


(j — 2, 3, 8) 


C* 

m 

qa 


C + 
m 


qa 


BC C 

m. 

a 


Z 

qa 
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m 


6 a 
e 


m 


'6 a 
e 


BC C 
m. Z- 
a 6 a 
e 


C 




2 



+ BC C 
a 




C’ ^ 
tn_2 

P a 


= c ^ + 

m2 
P a 


BC C ^ 
m. Z -2 
a Pa 


where 
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APPENDIX B 


PREDICTION SUM OF SQUARES CRITERION 
The linear regression model has the form 


Y = X0 t e 


(B1) 


where 0 is a vector of unknown parameters and e is a random vector inde- 
pendent of X and having zero mean and covariance cri. If you know the esti- 
mates 0/ you can predict the value of a future random variable y with the 
mean x0 and variance where x is a row vector of the matrix X con- 

taining the values of the independent variables associated with the future 
observation, 

A predictor y will be considered as an optimal predictor if the expected 

va lue 


e{y - y}^ 


(B2) 


has minimuin value. Equation (B2) is known as the mean square prediction error 
(MSPE). It can be expressed as 


e{y - y}^ = E{y - y - X0 + xG}^ 

= E{(y - x0)^ + (y - x0)^ - 2(y - x0) (y - x0)} 

= E{y - xG}^ + 0 ^ (B3) 


Furthermore 


e{y - xG}^ = E{[y - E(y)]^ + [E(y) - xG]^ + 2[y - E(y)][E(y) - xG]}^ 


= Var{y} + [E{y} - xO]^ 


(B4) 
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APPENDIX B 


After substituting equation (B4) into equation (B3) 


MSPE = + Var{y} + [E{y] - xG]^ 


(B5) 


which means that 


MSPE = Variance of the response 

+ Variance of the prediction 
+ Squared bias of the prediction 


It is shown in reference 18 that the addition of a variable to the predic- 
tion equation almost always increases (and never decreases) the variance of a 
predicted response. This means that for two models y^ = and y, = ^ 2 ^ 2 ’ 

where 0^ is a n x 1 vector and 0^ is a (n + 1) x 1 'vector of ^timated 
parameters 


Var{y 2 } ^ Var{y^} (B6) 

From equations (B5) and (B6), it can be concluded that, for a model with a 
redundant number of parameters, the MSPE will increase from its minimal value 
because of the increase in Var{y} . For the incomplete model, the MSPE will 
increase because of the bias error in prediction • 

For the practical implementation of the MSPE as a measure for the selec- 
tion of a parsimonious model, the prediction sum of squares (PRESS) criterion 
was formulated in reference 15. It has the form 


N 

PRESS = ^ fy(i) 
i=1 


y[i|x(i). 


x(i - 1), x(i + 1), x(N)]}^ 


(B7) 


which means that the PRESS uses (N - 1) data points for the estimation and one 
data point for the prediction. Equation (B7) is, however, not very convenient 
for computing of PRESS. A more efficient scheme is proposed in reference 15 
using the expression 

PRESS = ~ (B8) 

1=1 ^ _ Var{y(i)> 

2 

a 

where y(i) is now based on all the data points. 
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second term in the denominator of equation (B8) can be written as 


Var{^(i)> 

2 

a 



(B9) 


The behavior of equation (B9) with the increased number of data points can be 
examined from its limit as N This limit can be formulated as 


a 


lim 




(B10) 


if limf - X*^x] does exist. From equations (B8) and (B10), it is then 
N-x»\^ / 

apparent that the PRESS approaches the RSS for increasing number of data 
points. 
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TABLE II 


EFFECT OF MEASUREMENT NOISE ON MODEL STRUCTURE AND ON 


PARAMETER ESTIMATES FOR SIMULATED DATA 









Estimate, 0 


Parameter 

True value 

Case 1 

Case 2 

Case 3 

^Y,0 

0.0069 

0.0088 

0.0088 

0.0084 

c 

-.555 

-.557 

-.556 

-.553 






P 




-.101 

Cy 

-.103 

-.103 

-.102 


.88 

.795 

,710 

.640 


-.075 

-.077 

-.074 

-.069 


.05 

.050 

-.056 

.050 

Cy 

*pa 

1.34 

1.44 

1.60 

1.53 

Cy 

*ra 

-51 




Cy„2 

.47 




R^, % 


98.9 

98.7 

98.4 


-0.00042 

-0.00027 

-0.0011 

0.0012 


-.11 

-.108 

-.107 

-.105 


-.15 

-.145 

-.141 

-.139 


.21 

.197 

.255 

.228 


-.09 

-.092 

-.094 

-.093 


0 



-.003 

Cfl 

1.0 

1.05 

.92 

.87 

Apa 





CM 


95.2 

95.2 

94.6 

Cn.O 

0.00099 

0.00109 

0.00102 

0.00105 

c 

.03 

.0296 

.0270 

.0260 

"P 





-.063 

-.063 

-.064 

-.065 


-.084 

-.064 

-.086 

-.090 

C,^ . 

.013 

.013 

,016 

.016 

"6r 





‘'"6a 

-.033 

-.032 

-.031 

-.031 

Cn 

.77 

.359 

.856 

.838 

''pa 





c 

-1.33 

-1.34 



n 





ra 





R^, % 


85.0 

84.1 

83.0 




TABLE III.- EFFECT OF MEASUREMENT NOISE ON MODEL STRUCTURE FOR SIMULATED DATA 
DETERMINED BY STEPWISE REGRESSION WITHOUT CONSTRAINT 


Case 


Coefficient 


Cy 



True model 

B, p, r, 6 , 6 , 

3, p, r, 6 , pa 

3, p, r, 6 , 


a r 

a 



2 2 




pa, ra , a 


6 , pa , ra 
a 

1 

3, r, 6 6 , 

3/ Pf 5 , pa, r 

6 , 6 , p. 


r a 

a 

r a 


p, pa, ra^ 


pa , 3 / ra 

2 

3, 6 , r, 6 , 
r a 

3, p, 3^, pa, r 

ci 

6^, 6^, p. 


pa, p, ra^, 3^ 


pa, g, r, ra 

3 

3, 6 , r, 6 , 
r a 

3 / p / ^ / pa , 

3i 

6 , 6 , p, 

a r 


3 , <s a 
a 

^ 2 
r , 0 oc 
r 

pa , 3 , ra 



























—L 1 I I I I I I I I I I 1 I I I I I I 
0 5 10 15 20 

t, sec 

Figure 5.— Time histories of measured and computed yawing^moment 
coefficient, residuals, and autocorrelation function of resid- 
uals; linear model. 
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Figure 6.- Time histories of measured and computed yawing-moment 
coefficient, residuals, and autocorrelation function of resid- 
uals; adequate model. 
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Figure 9,- Comparison of lateral parameters estimated from flight data using 
modified stepwise regression and maximum likelihood method. 
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Figure 12,- Estimated standard errors of tvgo longitudinal coefficients. 
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Figure 13,- Number of data points in subsets using partitioning of data from 

large-amplitude longitudinal maneuver. 
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